Adjust heuristic for line-based versus byte-based diffing (#299) #312

rgosdin · 2022-11-15T04:56:26Z

Switch the internal interface for reporting to use a push-pop mechanism. This API is strictly more flexible than the previous Report-only API since it enables the reporter to re-construct the entire value tree. This change is done in preparation for a major overhaul of reporter logic.

Rename the file based on what it does rather than how its implemented.

Some of the value information is already in the path (e.g., MapIndex.Key and SliceIndex.Key), which seems to suggest that all of the value information should be in the Path. Doing so has several benefits: * It simplifies the comparison logic * It paves the way for FilterPath to be able to ignore missing slice elements or map entries * It allows for a simpler API for custom reporters One regression introduced by this change is the removal of batching for slices when sufficient number of elements differ. This logic was added as a hack in the past to keep the reporter output small in such cases. However, this batching functionality should be the responsibility of the reporter logic, not the comparer logic. A future change will refactor the reporter logic to reintroduce that feature.

This is necessary for the upcoming reporter re-implementation so that it can format the differences with more intelligence. No tests added in this PR since an upcoming change will heavily test these code paths.

Rather than checking up-front whether the values are invalid, give tryOptions a chance to operate on them and potentially ignore the values. This is useful since a value can only be invalid if it is a missing slice element or map entry, and provides FilterPath combined with Ignore the ability to ignore such cases. Some complexity is added to compareSlice to look for ignored elements first before applying diffing so that we can decouple the stability of the diffing algorithm from the primary result.

Add a benchmark that uses cmp.{Equal,Diff} to measure performance of comparing large byte slices. Comparing large slices of primitives is generally considered the most pathological use-case for cmp since the rules of cmp dictate that it must run the filters on every element of the slice even if the filters never apply. In theory, a user of cmp could short-circuit comparison of slices by providing a comparer for slices (e.g., cmp.Comparer(bytes.Equal)), but it would be ideal for cmp to be sufficiently optimized that comparing large slices is not too costly. BenchmarkBytes/4KiB/EqualFilter0-4 300 3792755 ns/op 2.16 MB/s 628702 B/op 8276 allocs/op BenchmarkBytes/4KiB/EqualFilter1-4 200 6058637 ns/op 1.35 MB/s 890920 B/op 16471 allocs/op BenchmarkBytes/4KiB/EqualFilter2-4 200 7498098 ns/op 1.09 MB/s 890952 B/op 16472 allocs/op BenchmarkBytes/4KiB/EqualFilter3-4 100 10094237 ns/op 0.81 MB/s 891016 B/op 16473 allocs/op BenchmarkBytes/4KiB/EqualFilter4-4 100 11128655 ns/op 0.74 MB/s 891016 B/op 16473 allocs/op BenchmarkBytes/4KiB/EqualFilter5-4 100 12952062 ns/op 0.63 MB/s 891144 B/op 16474 allocs/op BenchmarkBytes/4KiB/DiffFilter0-4 300 4015119 ns/op 2.04 MB/s 629425 B/op 8305 allocs/op BenchmarkBytes/4KiB/DiffFilter1-4 200 6457963 ns/op 1.27 MB/s 891738 B/op 16501 allocs/op BenchmarkBytes/4KiB/DiffFilter2-4 200 8371022 ns/op 0.98 MB/s 891770 B/op 16502 allocs/op BenchmarkBytes/4KiB/DiffFilter3-4 200 9420398 ns/op 0.87 MB/s 891834 B/op 16503 allocs/op BenchmarkBytes/4KiB/DiffFilter4-4 100 12493540 ns/op 0.66 MB/s 891835 B/op 16503 allocs/op BenchmarkBytes/4KiB/DiffFilter5-4 100 14208928 ns/op 0.58 MB/s 891965 B/op 16504 allocs/op BenchmarkBytes/64KiB/EqualFilter0-4 20 66146276 ns/op 1.98 MB/s 10826734 B/op 131206 allocs/op BenchmarkBytes/64KiB/EqualFilter1-4 10 102687117 ns/op 1.28 MB/s 15021123 B/op 262281 allocs/op BenchmarkBytes/64KiB/EqualFilter2-4 10 125444269 ns/op 1.04 MB/s 15021174 B/op 262282 allocs/op BenchmarkBytes/64KiB/EqualFilter3-4 10 154392619 ns/op 0.85 MB/s 15021219 B/op 262283 allocs/op BenchmarkBytes/64KiB/EqualFilter4-4 10 183307772 ns/op 0.72 MB/s 15021219 B/op 262283 allocs/op BenchmarkBytes/64KiB/EqualFilter5-4 5 212334761 ns/op 0.62 MB/s 15021340 B/op 262284 allocs/op BenchmarkBytes/64KiB/DiffFilter0-4 20 67147375 ns/op 1.95 MB/s 10828312 B/op 131244 allocs/op BenchmarkBytes/64KiB/DiffFilter1-4 10 105385709 ns/op 1.24 MB/s 15022724 B/op 262319 allocs/op BenchmarkBytes/64KiB/DiffFilter2-4 10 135994379 ns/op 0.96 MB/s 15022763 B/op 262320 allocs/op BenchmarkBytes/64KiB/DiffFilter3-4 10 159346836 ns/op 0.82 MB/s 15022820 B/op 262321 allocs/op BenchmarkBytes/64KiB/DiffFilter4-4 10 190129916 ns/op 0.69 MB/s 15022843 B/op 262321 allocs/op BenchmarkBytes/64KiB/DiffFilter5-4 5 208805873 ns/op 0.63 MB/s 15022996 B/op 262322 allocs/op BenchmarkBytes/1MiB/EqualFilter0-4 2 978302544 ns/op 2.14 MB/s 173184064 B/op 2097346 allocs/op BenchmarkBytes/1MiB/EqualFilter1-4 1 1543190869 ns/op 1.36 MB/s 240293104 B/op 4194502 allocs/op BenchmarkBytes/1MiB/EqualFilter2-4 1 1998443802 ns/op 1.05 MB/s 240293232 B/op 4194504 allocs/op BenchmarkBytes/1MiB/EqualFilter3-4 1 2507293058 ns/op 0.84 MB/s 240293328 B/op 4194506 allocs/op BenchmarkBytes/1MiB/EqualFilter4-4 1 2981132381 ns/op 0.70 MB/s 240293008 B/op 4194502 allocs/op BenchmarkBytes/1MiB/EqualFilter5-4 1 3351177035 ns/op 0.63 MB/s 240293424 B/op 4194506 allocs/op BenchmarkBytes/1MiB/DiffFilter0-4 1 1132136753 ns/op 1.85 MB/s 173185752 B/op 2097384 allocs/op BenchmarkBytes/1MiB/DiffFilter1-4 1 1666196345 ns/op 1.26 MB/s 240294504 B/op 4194537 allocs/op BenchmarkBytes/1MiB/DiffFilter2-4 1 2204467232 ns/op 0.95 MB/s 240294600 B/op 4194539 allocs/op BenchmarkBytes/1MiB/DiffFilter3-4 1 2499107753 ns/op 0.84 MB/s 240294600 B/op 4194539 allocs/op BenchmarkBytes/1MiB/DiffFilter4-4 1 2966222324 ns/op 0.71 MB/s 240295016 B/op 4194544 allocs/op BenchmarkBytes/1MiB/DiffFilter5-4 1 3382045549 ns/op 0.62 MB/s 240294728 B/op 4194540 allocs/op BenchmarkBytes/16MiB/EqualFilter0-4 1 16585516720 ns/op 2.02 MB/s 2967917664 B/op 33554689 allocs/op BenchmarkBytes/16MiB/EqualFilter1-4 1 23980880452 ns/op 1.40 MB/s 4041659536 B/op 67109123 allocs/op BenchmarkBytes/16MiB/EqualFilter2-4 1 30729382462 ns/op 1.09 MB/s 4041659568 B/op 67109124 allocs/op BenchmarkBytes/16MiB/EqualFilter3-4 1 37830223988 ns/op 0.89 MB/s 4041660016 B/op 67109129 allocs/op BenchmarkBytes/16MiB/EqualFilter4-4 1 44731081109 ns/op 0.75 MB/s 4041659536 B/op 67109124 allocs/op BenchmarkBytes/16MiB/EqualFilter5-4 1 52110015114 ns/op 0.64 MB/s 4041659760 B/op 67109126 allocs/op BenchmarkBytes/16MiB/DiffFilter0-4 1 20349410654 ns/op 1.65 MB/s 2967919128 B/op 33554724 allocs/op BenchmarkBytes/16MiB/DiffFilter1-4 1 27073250483 ns/op 1.24 MB/s 4041661320 B/op 67109163 allocs/op BenchmarkBytes/16MiB/DiffFilter2-4 1 32223912220 ns/op 1.04 MB/s 4041661064 B/op 67109160 allocs/op BenchmarkBytes/16MiB/DiffFilter3-4 1 39189759283 ns/op 0.86 MB/s 4041661128 B/op 67109161 allocs/op BenchmarkBytes/16MiB/DiffFilter4-4 1 48344470628 ns/op 0.69 MB/s 4041661256 B/op 67109163 allocs/op BenchmarkBytes/16MiB/DiffFilter5-4 1 51184873999 ns/op 0.66 MB/s 4041661256 B/op 67109162 allocs/op The last benchmark shows that the current implementation allocates an astonishing 3.75GiB just to compare a 16MiB byte slice.

Avoid appending validator in every call to tryOptions and instead ensure that the state starts with one in the options list. Doing so dramatically reduces the number of allocations. benchmark old MB/s new MB/s speedup BenchmarkBytes/4KiB/EqualFilter0-4 2.16 2.56 1.19x BenchmarkBytes/4KiB/EqualFilter1-4 1.35 1.53 1.13x BenchmarkBytes/4KiB/EqualFilter2-4 1.09 1.20 1.10x BenchmarkBytes/4KiB/EqualFilter3-4 0.81 0.99 1.22x BenchmarkBytes/4KiB/EqualFilter4-4 0.74 0.77 1.04x BenchmarkBytes/4KiB/EqualFilter5-4 0.63 0.69 1.10x BenchmarkBytes/4KiB/DiffFilter0-4 2.04 2.38 1.17x BenchmarkBytes/4KiB/DiffFilter1-4 1.27 1.52 1.20x BenchmarkBytes/4KiB/DiffFilter2-4 0.98 1.15 1.17x BenchmarkBytes/4KiB/DiffFilter3-4 0.87 0.96 1.10x BenchmarkBytes/4KiB/DiffFilter4-4 0.66 0.81 1.23x BenchmarkBytes/4KiB/DiffFilter5-4 0.58 0.69 1.19x BenchmarkBytes/64KiB/EqualFilter0-4 1.98 2.31 1.17x BenchmarkBytes/64KiB/EqualFilter1-4 1.28 1.71 1.34x BenchmarkBytes/64KiB/EqualFilter2-4 1.04 1.23 1.18x BenchmarkBytes/64KiB/EqualFilter3-4 0.85 0.93 1.09x BenchmarkBytes/64KiB/EqualFilter4-4 0.72 0.83 1.15x BenchmarkBytes/64KiB/EqualFilter5-4 0.62 0.70 1.13x BenchmarkBytes/64KiB/DiffFilter0-4 1.95 2.29 1.17x BenchmarkBytes/64KiB/DiffFilter1-4 1.24 1.56 1.26x BenchmarkBytes/64KiB/DiffFilter2-4 0.96 1.14 1.19x BenchmarkBytes/64KiB/DiffFilter3-4 0.82 0.97 1.18x BenchmarkBytes/64KiB/DiffFilter4-4 0.69 0.79 1.14x BenchmarkBytes/64KiB/DiffFilter5-4 0.63 0.70 1.11x BenchmarkBytes/1MiB/EqualFilter0-4 2.14 2.40 1.12x BenchmarkBytes/1MiB/EqualFilter1-4 1.36 1.63 1.20x BenchmarkBytes/1MiB/EqualFilter2-4 1.05 1.12 1.07x BenchmarkBytes/1MiB/EqualFilter3-4 0.84 0.81 0.96x BenchmarkBytes/1MiB/EqualFilter4-4 0.70 0.72 1.03x BenchmarkBytes/1MiB/EqualFilter5-4 0.63 0.64 1.02x BenchmarkBytes/1MiB/DiffFilter0-4 1.85 2.07 1.12x BenchmarkBytes/1MiB/DiffFilter1-4 1.26 1.49 1.18x BenchmarkBytes/1MiB/DiffFilter2-4 0.95 1.00 1.05x BenchmarkBytes/1MiB/DiffFilter3-4 0.84 0.88 1.05x BenchmarkBytes/1MiB/DiffFilter4-4 0.71 0.72 1.01x BenchmarkBytes/1MiB/DiffFilter5-4 0.62 0.67 1.08x BenchmarkBytes/16MiB/EqualFilter0-4 2.02 2.16 1.07x BenchmarkBytes/16MiB/EqualFilter1-4 1.40 1.56 1.11x BenchmarkBytes/16MiB/EqualFilter2-4 1.09 1.06 0.97x BenchmarkBytes/16MiB/EqualFilter3-4 0.89 0.88 0.99x BenchmarkBytes/16MiB/EqualFilter4-4 0.75 0.76 1.01x BenchmarkBytes/16MiB/EqualFilter5-4 0.64 0.64 1.00x BenchmarkBytes/16MiB/DiffFilter0-4 1.65 2.15 1.30x BenchmarkBytes/16MiB/DiffFilter1-4 1.24 1.35 1.09x BenchmarkBytes/16MiB/DiffFilter2-4 1.04 0.97 0.93x BenchmarkBytes/16MiB/DiffFilter3-4 0.86 0.87 1.01x BenchmarkBytes/16MiB/DiffFilter4-4 0.69 0.75 1.09x BenchmarkBytes/16MiB/DiffFilter5-4 0.66 0.65 0.98x benchmark old allocs new allocs delta BenchmarkBytes/4KiB/EqualFilter0-4 8276 83 -99.00% BenchmarkBytes/4KiB/EqualFilter1-4 16471 84 -99.49% BenchmarkBytes/4KiB/EqualFilter2-4 16472 85 -99.48% BenchmarkBytes/4KiB/EqualFilter3-4 16473 85 -99.48% BenchmarkBytes/4KiB/EqualFilter4-4 16473 86 -99.48% BenchmarkBytes/4KiB/EqualFilter5-4 16474 86 -99.48% BenchmarkBytes/4KiB/DiffFilter0-4 8305 111 -98.66% BenchmarkBytes/4KiB/DiffFilter1-4 16501 112 -99.32% BenchmarkBytes/4KiB/DiffFilter2-4 16502 113 -99.32% BenchmarkBytes/4KiB/DiffFilter3-4 16503 113 -99.32% BenchmarkBytes/4KiB/DiffFilter4-4 16503 114 -99.31% BenchmarkBytes/4KiB/DiffFilter5-4 16504 114 -99.31% BenchmarkBytes/64KiB/EqualFilter0-4 131206 133 -99.90% BenchmarkBytes/64KiB/EqualFilter1-4 262281 134 -99.95% BenchmarkBytes/64KiB/EqualFilter2-4 262282 135 -99.95% BenchmarkBytes/64KiB/EqualFilter3-4 262283 135 -99.95% BenchmarkBytes/64KiB/EqualFilter4-4 262283 136 -99.95% BenchmarkBytes/64KiB/EqualFilter5-4 262284 137 -99.95% BenchmarkBytes/64KiB/DiffFilter0-4 131244 171 -99.87% BenchmarkBytes/64KiB/DiffFilter1-4 262319 172 -99.93% BenchmarkBytes/64KiB/DiffFilter2-4 262320 173 -99.93% BenchmarkBytes/64KiB/DiffFilter3-4 262321 173 -99.93% BenchmarkBytes/64KiB/DiffFilter4-4 262321 174 -99.93% BenchmarkBytes/64KiB/DiffFilter5-4 262322 174 -99.93% BenchmarkBytes/1MiB/EqualFilter0-4 2097346 192 -99.99% BenchmarkBytes/1MiB/EqualFilter1-4 4194502 193 -100.00% BenchmarkBytes/1MiB/EqualFilter2-4 4194504 196 -100.00% BenchmarkBytes/1MiB/EqualFilter3-4 4194506 194 -100.00% BenchmarkBytes/1MiB/EqualFilter4-4 4194502 198 -100.00% BenchmarkBytes/1MiB/EqualFilter5-4 4194506 196 -100.00% BenchmarkBytes/1MiB/DiffFilter0-4 2097384 229 -99.99% BenchmarkBytes/1MiB/DiffFilter1-4 4194537 231 -99.99% BenchmarkBytes/1MiB/DiffFilter2-4 4194539 233 -99.99% BenchmarkBytes/1MiB/DiffFilter3-4 4194539 234 -99.99% BenchmarkBytes/1MiB/DiffFilter4-4 4194544 233 -99.99% BenchmarkBytes/1MiB/DiffFilter5-4 4194540 233 -99.99% BenchmarkBytes/16MiB/EqualFilter0-4 33554689 256 -100.00% BenchmarkBytes/16MiB/EqualFilter1-4 67109123 255 -100.00% BenchmarkBytes/16MiB/EqualFilter2-4 67109124 257 -100.00% BenchmarkBytes/16MiB/EqualFilter3-4 67109129 257 -100.00% BenchmarkBytes/16MiB/EqualFilter4-4 67109124 258 -100.00% BenchmarkBytes/16MiB/EqualFilter5-4 67109126 257 -100.00% BenchmarkBytes/16MiB/DiffFilter0-4 33554724 292 -100.00% BenchmarkBytes/16MiB/DiffFilter1-4 67109163 293 -100.00% BenchmarkBytes/16MiB/DiffFilter2-4 67109160 294 -100.00% BenchmarkBytes/16MiB/DiffFilter3-4 67109161 294 -100.00% BenchmarkBytes/16MiB/DiffFilter4-4 67109163 295 -100.00% BenchmarkBytes/16MiB/DiffFilter5-4 67109162 299 -100.00% benchmark old bytes new bytes delta BenchmarkBytes/4KiB/EqualFilter0-4 628702 366504 -41.70% BenchmarkBytes/4KiB/EqualFilter1-4 890920 366536 -58.86% BenchmarkBytes/4KiB/EqualFilter2-4 890952 366601 -58.85% BenchmarkBytes/4KiB/EqualFilter3-4 891016 366602 -58.86% BenchmarkBytes/4KiB/EqualFilter4-4 891016 366730 -58.84% BenchmarkBytes/4KiB/EqualFilter5-4 891144 366728 -58.85% BenchmarkBytes/4KiB/DiffFilter0-4 629425 367149 -41.67% BenchmarkBytes/4KiB/DiffFilter1-4 891738 367180 -58.82% BenchmarkBytes/4KiB/DiffFilter2-4 891770 367244 -58.82% BenchmarkBytes/4KiB/DiffFilter3-4 891834 367245 -58.82% BenchmarkBytes/4KiB/DiffFilter4-4 891835 367372 -58.81% BenchmarkBytes/4KiB/DiffFilter5-4 891965 367372 -58.81% BenchmarkBytes/64KiB/EqualFilter0-4 10826734 6632425 -38.74% BenchmarkBytes/64KiB/EqualFilter1-4 15021123 6632443 -55.85% BenchmarkBytes/64KiB/EqualFilter2-4 15021174 6632502 -55.85% BenchmarkBytes/64KiB/EqualFilter3-4 15021219 6632540 -55.85% BenchmarkBytes/64KiB/EqualFilter4-4 15021219 6632640 -55.84% BenchmarkBytes/64KiB/EqualFilter5-4 15021340 6632707 -55.84% BenchmarkBytes/64KiB/DiffFilter0-4 10828312 6633995 -38.73% BenchmarkBytes/64KiB/DiffFilter1-4 15022724 6634049 -55.84% BenchmarkBytes/64KiB/DiffFilter2-4 15022763 6634091 -55.84% BenchmarkBytes/64KiB/DiffFilter3-4 15022820 6634081 -55.84% BenchmarkBytes/64KiB/DiffFilter4-4 15022843 6634241 -55.84% BenchmarkBytes/64KiB/DiffFilter5-4 15022996 6634222 -55.84% BenchmarkBytes/1MiB/EqualFilter0-4 173184064 106075104 -38.75% BenchmarkBytes/1MiB/EqualFilter1-4 240293104 106075088 -55.86% BenchmarkBytes/1MiB/EqualFilter2-4 240293232 106075344 -55.86% BenchmarkBytes/1MiB/EqualFilter3-4 240293328 106075152 -55.86% BenchmarkBytes/1MiB/EqualFilter4-4 240293008 106075504 -55.86% BenchmarkBytes/1MiB/EqualFilter5-4 240293424 106075376 -55.86% BenchmarkBytes/1MiB/DiffFilter0-4 173185752 106076648 -38.75% BenchmarkBytes/1MiB/DiffFilter1-4 240294504 106076776 -55.86% BenchmarkBytes/1MiB/DiffFilter2-4 240294600 106076936 -55.86% BenchmarkBytes/1MiB/DiffFilter3-4 240294600 106076936 -55.86% BenchmarkBytes/1MiB/DiffFilter4-4 240295016 106076968 -55.86% BenchmarkBytes/1MiB/DiffFilter5-4 240294728 106076968 -55.86% BenchmarkBytes/16MiB/EqualFilter0-4 2967917664 1894175792 -36.18% BenchmarkBytes/16MiB/EqualFilter1-4 4041659536 1894175696 -53.13% BenchmarkBytes/16MiB/EqualFilter2-4 4041659568 1894175856 -53.13% BenchmarkBytes/16MiB/EqualFilter3-4 4041660016 1894175856 -53.13% BenchmarkBytes/16MiB/EqualFilter4-4 4041659536 1894175984 -53.13% BenchmarkBytes/16MiB/EqualFilter5-4 4041659760 1894175888 -53.13% BenchmarkBytes/16MiB/DiffFilter0-4 2967919128 1894177352 -36.18% BenchmarkBytes/16MiB/DiffFilter1-4 4041661320 1894177352 -53.13% BenchmarkBytes/16MiB/DiffFilter2-4 4041661064 1894177448 -53.13% BenchmarkBytes/16MiB/DiffFilter3-4 4041661128 1894177448 -53.13% BenchmarkBytes/16MiB/DiffFilter4-4 4041661256 1894177576 -53.13% BenchmarkBytes/16MiB/DiffFilter5-4 4041661256 1894177832 -53.13%

…ted options (#115) A common scenario is: 1. someone uses cmp.Diff on big.Int (transitively) 2. they get a message that says uses Ignore/Allow Unexported options 3. they see that AllowUnexported is hard to use correctly 4. they use IgnoreUnexported They end up with: cmpopts.IgnoreUnexported(big.Int{}) Which definitely doesn't do what's intended. If we point out that a custom comparer is what they most likely need, then they are more likely to use cmp correctly

The previous implementation of the reporter simply listed all differences, each qualified by the full path to the difference. This method of reporting is exact, but difficult for humans to parse. It is one of the more common sources of complaints by users and a significant reason why cmp is not preferred over competing libraries. This change reimplements the reporter to format the output as a structured literal in pseudo-Go syntax. The output resembles literals that the user would likely have in their test code. Differences between the x and y values are denoted by a '-' or '+' prefix at the start of the line. An overview of the new implementation is as follows: * report.go: The defaultReporter type implements the Reporter interface. * report_value: Through the PushStep/PopStep API, the defaultReporter is able to contruct an in-memory valueNode tree representing the comparison of x and y as cmp.Equal walks the sub-values. * report_compare.go: After report_value.go constructs an AST-representation of the compared values, report_compare.go formats the valueNode tree as a textNode tree, which is the textual output in a tree form. Some relevant design decisions include: * The format logic goes through effort to avoid printing ignored nodes. * Some number of surrounding equal (but not ignored) struct fields, slice elements, or map entries are printed for context. * cmp.Equal may declare two sub-reflect.Values to be equal, but are different values when printed. In order to present a unified view on this "equal" node, the logic formats both values and arbitrarily choses the one with the shorter string. * Transformed nodes are formatted with the pseudo-Go syntax of: Inverse(TransformerName, OutputType{...}) where Inverse is some magical pseudo-function that inverts the transformation referred to by TransformerName. The OutputType literal is the output of the transformation. * report_reflect.go: This contains logic to pretty-print reflect.Values and is relied upon by report_compare.go to format the leaves of the tree. Note that the leaves of the tree can be any arbitrary Go type and value (including cyclic data structures). * report_text.go: This contains logic for purely lexicographical formatting and is depended upon by the other report_*.go files. Advantages: * The output is more familiar as it uses pseudo-Go syntax for literals * It provides context about surrounding struct fields, slice elements, or map entries that were equal * Inserted and removed elements in a slice are easier to visualize * Related diffs lie on the same indentation * For diffs in a deeply nested value, the output is easier to visualize than having a list of all the full paths to the diff. Disadvantages: * The implementation is drastically more complex. * In most cases, the output is longer (though more sparse)

The Reporter option allows users to hook in their own custom reporters to programatically interpret the diff structure. The Reporter API uses a push/pop mechanism, which is strictly more powerful than an API that only calls report on leaf nodes. The Reporter.Report method takes in a Result type to provide flexibility in what properties can be reported in the future since new properties can be added, but new methods cannot be easily added to the reporter interface.

These helper options ignore slice elements or map entries based on a user-provided predicate function. These are especially useful for ignoring missing elements or entries.

Rather than representing each path step as an interface, use concrete types instead. This provides some performance benefits as it reduces the amount of virtual function calls and also provides the ability for the compiler to inline method calls. This is technically a breaking change, but since each of the path step interfaces were explicitly implemented in such a way that they couldn't be implemented directly (due to the presence of an unexported method), the only way someone could have been depending on these as interfaces is if they embedded the interface into another interface. Static analysis of all code at Google and publicly available on GitHub shows that this is not a problem. The performance benefits of this change is significant: benchmark old ns/op new ns/op delta BenchmarkBytes/64KiB/EqualFilter0-4 80551394 46592605 -42.16% BenchmarkBytes/64KiB/EqualFilter1-4 102922132 69974509 -32.01% BenchmarkBytes/64KiB/EqualFilter2-4 159009935 94474812 -40.59% BenchmarkBytes/64KiB/EqualFilter3-4 181231264 124601102 -31.25% BenchmarkBytes/64KiB/EqualFilter4-4 189775228 148864070 -21.56% BenchmarkBytes/64KiB/EqualFilter5-4 285065469 175198907 -38.54% benchmark old MB/s new MB/s speedup BenchmarkBytes/64KiB/EqualFilter0-4 1.63 2.81 1.72x BenchmarkBytes/64KiB/EqualFilter1-4 1.27 1.87 1.47x BenchmarkBytes/64KiB/EqualFilter2-4 0.82 1.39 1.70x BenchmarkBytes/64KiB/EqualFilter3-4 0.72 1.05 1.46x BenchmarkBytes/64KiB/EqualFilter4-4 0.69 0.88 1.28x BenchmarkBytes/64KiB/EqualFilter5-4 0.46 0.75 1.63x benchmark old allocs new allocs delta BenchmarkBytes/64KiB/EqualFilter0-4 133 134 +0.75% BenchmarkBytes/64KiB/EqualFilter1-4 134 134 +0.00% BenchmarkBytes/64KiB/EqualFilter2-4 135 135 +0.00% BenchmarkBytes/64KiB/EqualFilter3-4 135 135 +0.00% BenchmarkBytes/64KiB/EqualFilter4-4 136 136 +0.00% BenchmarkBytes/64KiB/EqualFilter5-4 136 136 +0.00% benchmark old bytes new bytes delta BenchmarkBytes/64KiB/EqualFilter0-4 6632417 6632523 +0.00% BenchmarkBytes/64KiB/EqualFilter1-4 6632416 6632464 +0.00% BenchmarkBytes/64KiB/EqualFilter2-4 6632464 6632507 +0.00% BenchmarkBytes/64KiB/EqualFilter3-4 6632502 6632483 -0.00% BenchmarkBytes/64KiB/EqualFilter4-4 6632652 6632668 +0.00% BenchmarkBytes/64KiB/EqualFilter5-4 6632604 6632659 +0.00%

Lists of primitives are a common-enough data structure that it is worth providing specialized diffing for. This provides significantly better readability for strings and byte slices. There is also a heuristic for detecting what a string should be diffed as a multiline string.

This reverts a change introduced in commit 2940eda where cmp stopped calling the String method when printing map keys. Fixes #141

I assume this is a typo, as the left and right side of the boolean-or were identical.

- No need to wrap ToUpper & Fields - Simpler concatenation - Easier pointer access

Previously, the line pointlessly updated the lastLineIdx variable since the subsequent line always assigned to it. Change it to update maxLineLen, which was the original intention.

* Fix IsZero to properly report false for IsZero(-0.0) since we define IsZero as whether it is equal to the zero memory value. * Add note to isLess that we don't need to handle -0.0 since we can't possibly have both keys present in the same map. * Use sort.SliceStable in SortedKeys for deterministic output since it is possible to have both -0.0 and +0.0 from two different maps. The zero key from the left left map will be taken over the right map.

Fixes #167

In the panic message when accessing an unexported field, print the full name of the type for user convenience.

Add an Exporter option that accepts a function to determine which struct types to permit access to unexported fields. Treat this as a first-class option and implement AllowUnexported in terms of the new Exporter option. The new Exporter option: * Better matches the existing style of top-level options both by name (e.g., Comparer, Transformer, and Reporter) and by API style (all accept a function). * Is more flexible as it enables users to functionally implement AllowAllUnexported by simply doing: Exporter(func(reflect.Type) bool { return true }) Fixes #40

Adjust coding style of EquateApproxTime to match EquateApprox.

The EquateErrors helper equates errors according to errors.Is. We also declare a sentinel AnyError value that matches any non-nil error value. This adds a dependency on golang.org/x/xerrors so that we can continue to suppport go1.8, which is our current minimally supported version of Go. Fixes #89

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

The color of the shield banner is "Gopher Blue" as defined by brand book referenced by the Go blog. See https://blog.golang.org/go-brand

Even with an optimal diffing algoritm, coalescing adjacent edit groups may cause the corresponding pair of strings for an edit group to have leading or trailing spans of equal elements. While this is technically a correct representation of a diff, it is a suboptimal outcome. As such, scan through all unequal groups and move leading/trailing equal spans to the preceding/succeeding equal group. Before this change: strings.Join({ "org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=aa", - ",#=_value _value=2 ", + " _value=2 ", `11 org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=bb`, - ",#=_value _value=2 2", + " _value=2 2", `1 org-4747474747474747,bucket-4242424242424242:m,tag1=b,tag2=cc`, - ",#=_value", ` _value=1 21 org-4747474747474747,bucket-4242424242424242:m,tag1`, "=a,tag2", - "=dd,#=_value", + "=dd", ` _value=3 31 org-4747474747474747,bucket-4242424242424242:m,tag1`, - "=c,#=_value", + "=c", ` _value=4 41 `, }, "") After this change: strings.Join({ "org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=aa", - ",#=_value", ` _value=2 11 org-4747474747474747,bucket-4242424242424242:m,tag1`, "=a,tag2=bb", - ",#=_value", ` _value=2 21 org-4747474747474747,bucket-4242424242424242:m,tag1`, "=b,tag2=cc", - ",#=_value", ` _value=1 21 org-4747474747474747,bucket-4242424242424242:m,tag1`, "=a,tag2=dd", - ",#=_value", ` _value=3 31 org-4747474747474747,bucket-4242424242424242:m,tag1`, "=c", - ",#=_value", ` _value=4 41 `, }, "")

Avoid diffing by lines if it turns out to be significantly less efficient than diffing by bytes. Before this change: ( """ - d5c14bdf6bac81c27afc5429500ed750 - 25483503b557c606dad4f144d27ae10b - 90bdbcdbb6ea7156068e3dcfb7459244 - 978f480a6e3cced51e297fbff9a506b7 + Xd5c14bdf6bac81c27afc5429500ed750 + X25483503b557c606dad4f144d27ae10b + X90bdbcdbb6ea7156068e3dcfb7459244 + X978f480a6e3cced51e297fbff9a506b7 """ ) After this change: strings.Join({ + "X", "d5c14bdf6bac81c27afc5429500ed750\n", + "X", "25483503b557c606dad4f144d27ae10b\n", + "X", "90bdbcdbb6ea7156068e3dcfb7459244\n", + "X", "978f480a6e3cced51e297fbff9a506b7\n", }, "")

The previous heuristic of treating strings as binary data if it contains any invalid UTF-8 was too strict. Loosen the heuristic to check if most of the characters are printable text. Fixes #257

Address some minor issues flagged by staticcheck. None of these affect the correctness of the package.

Rename the shadowed variable i to j for better readability.

There are two bugs being fixed: 1. The hueristic for whether a slice of byte looks like text should check whether a rune IsPrint OR IsSpace, and not both. Only a single rune (i.e., U+0020) ever satisfies both conditions. Previously, it would print as: MyBytes{0x68, 0x65, 0x6c, 0x6c, 0x6f} and now it would now print as: MyBytes(MyBytes("hello")) 2. If we're printing as string, then we should set skipType=true since we already explicitly format the value with the type. Previously, it would print as: MyBytes(MyBytes("hello")) and now it would now print as: MyBytes("hello")

Fix textual printing of byte slices

Some aggressive dependency checks flag the use of md5. Switch to sha256 as it accomplishes the same purpose.

Co-authored-by: Damien Neil <neild@users.noreply.github.com>

See golang/go#33232.

The original threshold of 64 was chosen without much thought. Lower it to 32 now that we have some concrete examples that it is aesthetically better. Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Now that Go 1.11 is the minimally supported version, we can drop some local hacks to work around bugs in reflect that were present in Go1.9.

Starting with Go 1.17, //go:build lines are preferred over // +build lines, see https://golang.org/doc/go1.17#build-lines and https://golang.org/design/draft-gobuild for details. This change was generated by running Go 1.17 go fmt ./... which automatically adds //go:build lines based on the existing // +build lines. Also update the corresponding GitHub action to use Go 1.17 gofmt.

When printing a pointer, only elide the type for unnamed pointers. Otherwise, we can run into situations where named and unnamed pointers format the same way in indistinguishable ways. When printing an interview, never skip the interface type. Whether we skip printing the type should be determined by the parent containers, and not locally determined. For examples, interface values within a struct, slice, or map will always be elided since they can be inferred.

If a slice of bytes is mostly text, format them as text instead of as []byte literal with hexadecimal digits. Avoid always printing the type. This is technically invalid Go code, but is unnecessary in many cases since the type is inferred from the parent concrete type. Fixes #272

Versions older than Go 1.13 are no longer in use. Remove unnecessary dependencies.

Now that Go 1.13 is the minimum version, we can use the reflect.Value.IsZero method instead of our own internal/value.IsZero function. Interestingly, our IsZero function pre-dates the IsZero method, but fortunately has the exact same semantics, since both are targetting semantics defined by the Go language specification.

This allows the GoDoc to take advantage of new markup syntax introduced in Go 1.19. This does not require that our minimum supported version be bumped to Go 1.19 since the pkgsite renders our godoc regardless of supported Go version.

Co-authored-by: Damien Neil <neild@users.noreply.github.com>

The value.TypeString function is what the rest of the package uses and is slightly cleaner than using reflect.Type.String. Updates #305 Co-authored-by: Damien Neil <neild@users.noreply.github.com>

* Run tests on Go 1.19 * Format comment Finish the rest of the work for #304 Co-authored-by: Damien Neil <neild@users.noreply.github.com>

If the string has many characters that require escape sequences to print, then we need to take that into consideration and avoid byte-by-byte diffing. Co-authored-by: Damien Neil <neild@users.noreply.github.com>

google-cla · 2022-11-15T04:56:42Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

dsnet · 2023-02-22T20:06:52Z

I'm not sure what this is. Closing as it looks like an accidental PR. Feel free to send another cleaned up PR if this was intended to be a legitimate change.

dsnet and others added 30 commits February 25, 2019 17:37

Rename unsafe_x.go as export_x.go (#117)

7d31622

Rename the file based on what it does rather than how its implemented.

Rename {NDiff,NSame} as {NumDiff,NumSame} (#118)

3e44f05

Update to go1.12 (#120)

ce5a20d

Augment Report to indicate how comparison was determined (#122)

cc11d21

This is necessary for the upcoming reporter re-implementation so that it can format the differences with more intelligence. No tests added in this PR since an upcoming change will heavily test these code paths.

Add IgnoreSliceElements and IgnoreMapEntries helpers (#126)

0376dcf

These helper options ignore slice elements or map entries based on a user-provided predicate function. These are especially useful for ignoring missing elements or entries.

Improve clarity of compareAny (#132)

6f77996

Invoke String when formatting map keys (#142)

917e382

This reverts a change introduced in commit 2940eda where cmp stopped calling the String method when printing map keys. Fixes #141

Document the reason for output instability (#145)

1b31600

Fixed typo in formatDiffList (#148)

00cb0dc

I assume this is a typo, as the left and right side of the boolean-or were identical.

Simplify code (#149)

6d8cafd

- No need to wrap ToUpper & Fields - Simpler concatenation - Easier pointer access

Fix updating of maxLineLen (#147)

208900a

Previously, the line pointlessly updated the lastLineIdx variable since the subsequent line always assigned to it. Change it to update maxLineLen, which was the original intention.

cmpopts: add EquateApproxTime (#158)

b1c9c48

Make retrieveUnexportedField pass Go 1.14's checkptr validation (#169)

481baca

Fixes #167

Print type name in unexported panic (#171)

776445f

In the panic message when accessing an unexported field, print the full name of the type for user convenience.

Adjust style of EquateApproxTime (#177)

3838af3

Adjust coding style of EquateApproxTime to match EquateApprox.

Update README.md to use go.dev for documentation (#190)

5915021

The color of the shield banner is "Gopher Blue" as defined by brand book referenced by the Go blog. See https://blog.golang.org/go-brand

dsnet and others added 27 commits May 24, 2021 17:33

Print as text if mostly text (#258)

d103655

The previous heuristic of treating strings as binary data if it contains any invalid UTF-8 was too strict. Loosen the heuristic to check if most of the characters are printable text. Fixes #257

Fix staticcheck findings (#262)

248ccff

Address some minor issues flagged by staticcheck. None of these affect the correctness of the package.

Avoid shadowing variable (#263)

290a6a2

Rename the shadowed variable i to j for better readability.

Merge pull request #266 from dsnet/fix-format

402949e

Fix textual printing of byte slices

Use sha256 in test (#268)

395a0ac

Some aggressive dependency checks flag the use of md5. Switch to sha256 as it accomplishes the same purpose.

Fix spelling mistakes (#271)

3ee52c8

Change build status badge (#269)

9094ef9

Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Use any alias instead of interface{} (#276)

f1773ad

See golang/go#33232.

Reduce minimum length for specialize string diffing (#275)

6faefd0

The original threshold of 64 was chosen without much thought. Lower it to 32 now that we have some concrete examples that it is aesthetically better. Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Update minimum supported version to go1.11 (#281)

f59cd61

Drop hacks to work around Go reflection bugs in Go1.9 (#282)

3242228

Now that Go 1.11 is the minimally supported version, we can drop some local hacks to work around bugs in reflect that were present in Go1.9.

Run tests on Go 1.18 (#290)

79433ac

remove xerrors (#292)

63c2960

Versions older than Go 1.13 are no longer in use. Remove unnecessary dependencies.

Additional cleanup with Go 1.13 as minimal version (#295)

f144a35

Format with Go 1.19 formatter (#304)

14ad8a0

This allows the GoDoc to take advantage of new markup syntax introduced in Go 1.19. This does not require that our minimum supported version be bumped to Go 1.19 since the pkgsite renders our godoc regardless of supported Go version.

Fix typo in Result documentation (#300)

5dac6aa

Pre-declare global type variables (#302)

f36a68d

Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Use value.TypeString in PathStep.String (#306)

6606d4d

The value.TypeString function is what the rest of the package uses and is slightly cleaner than using reflect.Type.String. Updates #305 Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Run tests on Go 1.19 (#309)

377d283

* Run tests on Go 1.19 * Format comment Finish the rest of the work for #304 Co-authored-by: Damien Neil <neild@users.noreply.github.com>

Adjust heuristic for line-based versus byte-based diffing (#299)

a97318b

If the string has many characters that require escape sequences to print, then we need to take that into consideration and avoid byte-by-byte diffing. Co-authored-by: Damien Neil <neild@users.noreply.github.com>

dsnet closed this Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust heuristic for line-based versus byte-based diffing (#299) #312

Adjust heuristic for line-based versus byte-based diffing (#299) #312

rgosdin commented Nov 15, 2022

google-cla bot commented Nov 15, 2022

dsnet commented Feb 22, 2023

Adjust heuristic for line-based versus byte-based diffing (#299) #312

Adjust heuristic for line-based versus byte-based diffing (#299) #312

Conversation

rgosdin commented Nov 15, 2022

google-cla bot commented Nov 15, 2022

dsnet commented Feb 22, 2023